class: center, middle, inverse, title-slide # Large-Scale Time Series Forecasting ### Thiyanga S. Talagala ### R Ladies Bergan, Norway ### 24 June 2021 --- background-image: url(img/jhu.png) background-size: contain --- class: inverse, center, middle background-image: url(img/jhu.png) background-position: 50% 60%1 background-size: contain
Let's visualize the coronavirus pandemic!
--- background-image: url(img/coronavirus.png) background-size: 90px background-position: 100% 6% # Data: coronavirus package .pull-left[ ```r install.packages("coronavirus") devtools::install_github("RamiKrispin/coronavirus") ``` ] .pull-right[ ```r library(coronavirus) head(coronavirus, 8) ``` ``` date province country lat long type cases 1 2020-01-22 Afghanistan 33.93911 67.70995 confirmed 0 2 2020-01-22 Albania 41.15330 20.16830 confirmed 0 3 2020-01-22 Algeria 28.03390 1.65960 confirmed 0 4 2020-01-22 Andorra 42.50630 1.52180 confirmed 0 5 2020-01-22 Angola -11.20270 17.87390 confirmed 0 6 2020-01-22 Antigua and Barbuda 17.06080 -61.79640 confirmed 0 7 2020-01-22 Argentina -38.41610 -63.61670 confirmed 0 8 2020-01-22 Armenia 40.06910 45.03820 confirmed 0 ``` ] ---
--- <!-- --> --- <!-- --> --- class: split-70 hide-slide-number background-image: url("img/jhu.png") background-size: cover .column.slide-in-left[ .sliderbox.vmiddle.shade_main.center[ .font5[Time Series Features]]] .column[ ] --- # Time series features Transform a given time series `\(y=\{y_1, y_2, \cdots, y_n\}\)` to a feature vector `\(F = (f_1(y), f_2(y), \cdots, f_p(y))'\)`. ## Examples of time series features - strength of trend - strength of seasonality - lag-1 autocorrelation - spectral entropy - proportion of zeros More information: --- class: split-two white .column.bg-main1[.content.vmiddle.center[ ## Time-domain representation <!-- --> ]] .column.bg-main2[.content.vmiddle.center[ ## Feature-domain representation <!-- --> ]] --- class: split-two white .column.bg-white[.content[ ```r library(tsibble) norway <- confirmed %>% filter(country == "Norway") ``` ]] .column.bg-white[.content[ ]] --- class: split-two white .column.bg-white[.content[ ```r library(tsibble) norway <- confirmed %>% filter(country == "Norway") *norway.tsibble <- tsibble( # * date = as.Date("2020-01-22") + 0:491, # * Observation = norway$cases, # * index = date) # ``` ]] .column.bg-white[.content[ ]] --- class: split-two white .column.bg-white[.content[ ```r library(tsibble) norway <- confirmed %>% filter(country == "Norway") norway.tsibble <- tsibble( date = as.Date("2020-01-22") + 0:491, Observation = norway$cases, index = date) *norway.tsibble # ``` ``` # A tsibble: 492 x 2 [1D] date Observation <date> <dbl> 1 2020-01-22 0 2 2020-01-23 0 3 2020-01-24 0 4 2020-01-25 0 5 2020-01-26 0 6 2020-01-27 0 7 2020-01-28 0 8 2020-01-29 0 9 2020-01-30 0 10 2020-01-31 0 # … with 482 more rows ``` ]] .column.bg-white[.content[ ]] --- class: split-two white .column.bg-white[.content[ ```r library(tsibble) norway <- confirmed %>% filter(country == "Norway") norway.tsibble <- tsibble( date = as.Date("2020-01-22") + 0:491, Observation = norway$cases, index = date) norway.tsibble ``` ``` # A tsibble: 492 x 2 [1D] date Observation <date> <dbl> 1 2020-01-22 0 2 2020-01-23 0 3 2020-01-24 0 4 2020-01-25 0 5 2020-01-26 0 6 2020-01-27 0 7 2020-01-28 0 8 2020-01-29 0 9 2020-01-30 0 10 2020-01-31 0 # … with 482 more rows ``` ]] .column.bg-white[.content[ ```r *library(fable) # *autoplot(norway.tsibble) # ``` <!-- --> ]] --- ## Compute features ```r *library(feasts) # *norway.tsibble %>% # features(Observation, feature_set(tags = * c("decomposition", "intermittent", "autocorrelation"))) %>% as.data.frame()# ``` ``` trend_strength seasonal_strength_week seasonal_peak_week seasonal_trough_week 1 0.9158947 0.4880091 0 5 spikiness linearity curvature stl_e_acf1 stl_e_acf10 acf1 acf10 1 5673.13 4291.447 767.7658 -0.206399 0.1942277 0.853502 6.380439 diff1_acf1 diff1_acf10 diff2_acf1 diff2_acf10 season_acf1 pacf5 1 -0.322936 0.1896912 -0.5713678 0.3516748 0.8460248 0.9376168 diff1_pacf5 diff2_pacf5 season_pacf zero_run_mean nonzero_squared_cv 1 0.453038 1.107245 0.2593937 6.833333 0.9778515 zero_start_prop zero_end_prop 1 0.07113821 0 ``` --- class: split-two white .column.bg-white[.content[ tibble ```r confirmed ``` ``` # A tibble: 94,956 x 3 # Groups: country [193] country date cases <chr> <date> <dbl> 1 Afghanistan 2020-01-22 0 2 Afghanistan 2020-01-23 0 3 Afghanistan 2020-01-24 0 4 Afghanistan 2020-01-25 0 5 Afghanistan 2020-01-26 0 6 Afghanistan 2020-01-27 0 7 Afghanistan 2020-01-28 0 8 Afghanistan 2020-01-29 0 9 Afghanistan 2020-01-30 0 10 Afghanistan 2020-01-31 0 # … with 94,946 more rows ``` ]] -- .column.bg-white[.content[ **ts**ibble ```r confirmed.tsibble <- confirmed %>% as_tsibble(index = date, key = country) confirmed.tsibble ``` ``` # A tsibble: 94,956 x 3 [1D] # Key: country [193] # Groups: country [193] country date cases <chr> <date> <dbl> 1 Afghanistan 2020-01-22 0 2 Afghanistan 2020-01-23 0 3 Afghanistan 2020-01-24 0 4 Afghanistan 2020-01-25 0 5 Afghanistan 2020-01-26 0 6 Afghanistan 2020-01-27 0 7 Afghanistan 2020-01-28 0 8 Afghanistan 2020-01-29 0 9 Afghanistan 2020-01-30 0 10 Afghanistan 2020-01-31 0 # … with 94,946 more rows ``` ]] --- ## Features for all countries ```r features.all <- confirmed.tsibble %>% features(cases, feature_set(tags = c("decomposition", "intermittent", "autocorrelation"))) features.all ``` ``` # A tibble: 193 x 25 country trend_strength seasonal_strengt… seasonal_peak_we… seasonal_trough… <chr> <dbl> <dbl> <dbl> <dbl> 1 Afghanis… 0.870 0.260 6 5 2 Albania 0.985 0.493 2 6 3 Algeria 0.991 0.308 2 4 4 Andorra 0.684 0.656 6 5 5 Angola 0.914 0.380 1 6 6 Antigua … 0.437 0.169 2 5 7 Argentina 0.980 0.774 2 5 8 Armenia 0.982 0.876 2 6 9 Australia 0.833 0.271 1 3 10 Austria 0.984 0.712 2 6 # … with 183 more rows, and 20 more variables: spikiness <dbl>, # linearity <dbl>, curvature <dbl>, stl_e_acf1 <dbl>, stl_e_acf10 <dbl>, # acf1 <dbl>, acf10 <dbl>, diff1_acf1 <dbl>, diff1_acf10 <dbl>, # diff2_acf1 <dbl>, diff2_acf10 <dbl>, season_acf1 <dbl>, pacf5 <dbl>, # diff1_pacf5 <dbl>, diff2_pacf5 <dbl>, season_pacf <dbl>, # zero_run_mean <dbl>, nonzero_squared_cv <dbl>, zero_start_prop <dbl>, # zero_end_prop <dbl> ``` --- class: split-two white .column.bg-white[.content[ ## Feature-based visualization ```r features.all %>% ggplot(aes(x = trend_strength, y = seasonal_strength_week)) + geom_point() + coord_equal() + xlim(c(0,1)) + ylim(c(0,1)) + labs(x = "Trend strength", y = "Seasonal strength") + theme(legend.position = "bottom") ``` ]] .column.bg-white[.content[
]] --- --- class: center, middle # Thank you!